Research_AI_Assistant / DEPLOYMENT_CHECKLIST.md
JatsTheAIGen's picture
feat: Add URL validation and update to verified Runpod URL
e716c29
|
raw
history blame
6.48 kB

Deployment Checklist - ZeroGPU Integration

βœ… Pre-Deployment Verification

Code Status

  • βœ… All code changes committed and pushed
  • βœ… FAISS-GPU implementation complete
  • βœ… Lazy-loaded local model fallback implemented
  • βœ… ZeroGPU API integration complete
  • βœ… Dockerfile configured correctly
  • βœ… Requirements.txt updated with faiss-gpu

Files Ready

  • βœ… Dockerfile - Configured for HF Spaces
  • βœ… main.py - Entry point for HF Spaces
  • βœ… requirements.txt - All dependencies including faiss-gpu
  • βœ… README.md - Contains HF Spaces configuration

πŸš€ Deployment Steps

1. Verify Repository Status

git status  # Should show clean or only documentation changes
git log --oneline -5  # Verify recent commits are pushed

2. Hugging Face Spaces Configuration

Space Settings

  1. Go to: https://huggingface.co/spaces/JatinAutonomousLabs/Research_AI_Assistant
  2. Navigate to Settings β†’ Repository secrets

Required Environment Variables

Basic Configuration:

HF_TOKEN=your_huggingface_token_here

ZeroGPU API Configuration (Optional - for Runpod integration):

Option A: Service Account Mode

USE_ZERO_GPU=true
ZERO_GPU_API_URL=https://bm9njt1ypzvuqw-8000.proxy.runpod.net
ZERO_GPU_EMAIL=service@example.com
ZERO_GPU_PASSWORD=your-password

Option B: Per-User Mode (Multi-tenant)

USE_ZERO_GPU=true
ZERO_GPU_PER_USER_MODE=true
ZERO_GPU_API_URL=https://bm9njt1ypzvuqw-8000.proxy.runpod.net
ZERO_GPU_ADMIN_EMAIL=admin@example.com
ZERO_GPU_ADMIN_PASSWORD=admin-password

Note: Runpod proxy URLs follow the format: https://<pod-id>-8000.proxy.runpod.net

Additional Optional Variables:

DB_PATH=sessions.db
LOG_LEVEL=INFO
MAX_WORKERS=4

3. Hardware Selection

In HF Spaces Settings:

  • GPU: NVIDIA T4 Medium (recommended)
    • 24GB vRAM (sufficient for local model fallback)
    • 30GB RAM
    • 8 vCPU

Note: With ZeroGPU API enabled, GPU is only needed for:

  • FAISS-GPU vector search (automatic CPU fallback if GPU unavailable)
  • Local model fallback (only loads if ZeroGPU fails)

4. Deployment Process

Automatic Deployment:

  1. Code is already pushed to main branch
  2. HF Spaces will automatically:
    • Detect sdk: docker in README.md
    • Build Docker image from Dockerfile
    • Install dependencies from requirements.txt
    • Start application using main.py

Manual Trigger (if needed):

  • Go to Space β†’ Settings β†’ Restart this Space

5. Monitor Deployment

Check Build Logs:

  • Navigate to Space β†’ Logs
  • Watch for:
    • βœ… Docker build success
    • βœ… Dependencies installed (including faiss-gpu)
    • βœ… Application startup
    • βœ… ZeroGPU client initialization (if configured)
    • βœ… Local model loader initialized (as fallback)

Expected Startup Messages:

βœ“ Local model loader initialized (models will load on-demand as fallback)
βœ“ ZeroGPU API client initialized (service account mode)
βœ“ FAISS GPU resources initialized
βœ“ Application ready for launch

6. Verify Deployment

Health Check:

  • Application should be accessible at: https://huggingface.co/spaces/JatinAutonomousLabs/Research_AI_Assistant
  • Health endpoint: /health should return {"status": "healthy"}

Test ZeroGPU Integration:

  1. Send a test message through the UI
  2. Check logs for: "Inference complete for {task_type} (ZeroGPU API)"
  3. Verify no local models are loaded (if ZeroGPU working)

Test Fallback:

  1. Temporarily disable ZeroGPU (set USE_ZERO_GPU=false)
  2. Send a test message
  3. Check logs for: "Lazy loading local model {model_id} as fallback"
  4. Verify local model loads and works

πŸ” Post-Deployment Verification

1. Check Application Status

  • Application loads without errors
  • UI is accessible
  • Health check endpoint responds

2. Verify ZeroGPU Integration

  • ZeroGPU client initializes (if configured)
  • API calls succeed
  • No local models loaded (if ZeroGPU working)
  • Usage statistics accessible (if per-user mode)

3. Verify FAISS-GPU

  • FAISS GPU resources initialize
  • Vector search works
  • Falls back to CPU if GPU unavailable

4. Verify Fallback Chain

  • ZeroGPU API tried first
  • Local models load only if ZeroGPU fails
  • HF Inference API used as final fallback

5. Monitor Resource Usage

  • GPU memory usage is low (if ZeroGPU working)
  • CPU usage is reasonable
  • No memory leaks

πŸ› Troubleshooting

Issue: Build Fails

Check:

  • Dockerfile syntax is correct
  • Requirements.txt has all dependencies
  • Python 3.10 is available

Solution:

  • Review build logs in HF Spaces
  • Test Docker build locally: docker build -t test .

Issue: ZeroGPU Not Working

Check:

  • Environment variables are set correctly
  • ZeroGPU API is accessible from HF Spaces
  • Network connectivity to Runpod

Solution:

  • Verify API URL is correct
  • Check credentials are valid
  • Review ZeroGPU API logs

Issue: FAISS-GPU Not Available

Check:

  • GPU is available in HF Spaces
  • faiss-gpu package installed correctly

Solution:

  • System will automatically fall back to CPU
  • Check logs for: "FAISS GPU not available, using CPU"

Issue: Local Models Not Loading

Check:

  • use_local_models=True in code
  • Transformers/torch available
  • GPU memory sufficient

Solution:

  • Check logs for initialization errors
  • Verify GPU availability
  • Models will only load if ZeroGPU fails

πŸ“Š Expected Resource Usage

With ZeroGPU API Enabled (Optimal)

  • GPU Memory: ~0-500MB (FAISS-GPU only, no local models)
  • CPU: Low (API calls only)
  • RAM: ~2-4GB (application + caching)

With ZeroGPU Failing (Fallback Active)

  • GPU Memory: ~15GB (local models loaded)
  • CPU: Medium (model inference)
  • RAM: ~4-6GB (models + application)

FAISS-GPU Usage

  • GPU Memory: ~100-500MB (depending on index size)
  • CPU Fallback: Automatic if GPU unavailable

βœ… Deployment Complete

Once all checks pass:

  • βœ… Application is live
  • βœ… ZeroGPU integration working
  • βœ… FAISS-GPU accelerated
  • βœ… Fallback chain operational
  • βœ… Monitoring in place

Next Steps:

  • Monitor usage statistics
  • Review ZeroGPU API logs
  • Optimize based on usage patterns
  • Scale as needed

Last Updated: 2025-01-07
Deployment Status: Ready
Version: With ZeroGPU Integration + FAISS-GPU + Lazy Loading