Deployment Checklist - ZeroGPU Integration
β Pre-Deployment Verification
Code Status
- β All code changes committed and pushed
- β FAISS-GPU implementation complete
- β Lazy-loaded local model fallback implemented
- β ZeroGPU API integration complete
- β Dockerfile configured correctly
- β Requirements.txt updated with faiss-gpu
Files Ready
- β
Dockerfile- Configured for HF Spaces - β
main.py- Entry point for HF Spaces - β
requirements.txt- All dependencies including faiss-gpu - β
README.md- Contains HF Spaces configuration
π Deployment Steps
1. Verify Repository Status
git status # Should show clean or only documentation changes
git log --oneline -5 # Verify recent commits are pushed
2. Hugging Face Spaces Configuration
Space Settings
- Go to: https://huggingface.co/spaces/JatinAutonomousLabs/Research_AI_Assistant
- Navigate to Settings β Repository secrets
Required Environment Variables
Basic Configuration:
HF_TOKEN=your_huggingface_token_here
ZeroGPU API Configuration (Optional - for Runpod integration):
Option A: Service Account Mode
USE_ZERO_GPU=true
ZERO_GPU_API_URL=https://bm9njt1ypzvuqw-8000.proxy.runpod.net
ZERO_GPU_EMAIL=service@example.com
ZERO_GPU_PASSWORD=your-password
Option B: Per-User Mode (Multi-tenant)
USE_ZERO_GPU=true
ZERO_GPU_PER_USER_MODE=true
ZERO_GPU_API_URL=https://bm9njt1ypzvuqw-8000.proxy.runpod.net
ZERO_GPU_ADMIN_EMAIL=admin@example.com
ZERO_GPU_ADMIN_PASSWORD=admin-password
Note: Runpod proxy URLs follow the format: https://<pod-id>-8000.proxy.runpod.net
Additional Optional Variables:
DB_PATH=sessions.db
LOG_LEVEL=INFO
MAX_WORKERS=4
3. Hardware Selection
In HF Spaces Settings:
- GPU: NVIDIA T4 Medium (recommended)
- 24GB vRAM (sufficient for local model fallback)
- 30GB RAM
- 8 vCPU
Note: With ZeroGPU API enabled, GPU is only needed for:
- FAISS-GPU vector search (automatic CPU fallback if GPU unavailable)
- Local model fallback (only loads if ZeroGPU fails)
4. Deployment Process
Automatic Deployment:
- Code is already pushed to
mainbranch - HF Spaces will automatically:
- Detect
sdk: dockerin README.md - Build Docker image from Dockerfile
- Install dependencies from requirements.txt
- Start application using
main.py
- Detect
Manual Trigger (if needed):
- Go to Space β Settings β Restart this Space
5. Monitor Deployment
Check Build Logs:
- Navigate to Space β Logs
- Watch for:
- β Docker build success
- β Dependencies installed (including faiss-gpu)
- β Application startup
- β ZeroGPU client initialization (if configured)
- β Local model loader initialized (as fallback)
Expected Startup Messages:
β Local model loader initialized (models will load on-demand as fallback)
β ZeroGPU API client initialized (service account mode)
β FAISS GPU resources initialized
β Application ready for launch
6. Verify Deployment
Health Check:
- Application should be accessible at:
https://huggingface.co/spaces/JatinAutonomousLabs/Research_AI_Assistant - Health endpoint:
/healthshould return{"status": "healthy"}
Test ZeroGPU Integration:
- Send a test message through the UI
- Check logs for:
"Inference complete for {task_type} (ZeroGPU API)" - Verify no local models are loaded (if ZeroGPU working)
Test Fallback:
- Temporarily disable ZeroGPU (set
USE_ZERO_GPU=false) - Send a test message
- Check logs for:
"Lazy loading local model {model_id} as fallback" - Verify local model loads and works
π Post-Deployment Verification
1. Check Application Status
- Application loads without errors
- UI is accessible
- Health check endpoint responds
2. Verify ZeroGPU Integration
- ZeroGPU client initializes (if configured)
- API calls succeed
- No local models loaded (if ZeroGPU working)
- Usage statistics accessible (if per-user mode)
3. Verify FAISS-GPU
- FAISS GPU resources initialize
- Vector search works
- Falls back to CPU if GPU unavailable
4. Verify Fallback Chain
- ZeroGPU API tried first
- Local models load only if ZeroGPU fails
- HF Inference API used as final fallback
5. Monitor Resource Usage
- GPU memory usage is low (if ZeroGPU working)
- CPU usage is reasonable
- No memory leaks
π Troubleshooting
Issue: Build Fails
Check:
- Dockerfile syntax is correct
- Requirements.txt has all dependencies
- Python 3.10 is available
Solution:
- Review build logs in HF Spaces
- Test Docker build locally:
docker build -t test .
Issue: ZeroGPU Not Working
Check:
- Environment variables are set correctly
- ZeroGPU API is accessible from HF Spaces
- Network connectivity to Runpod
Solution:
- Verify API URL is correct
- Check credentials are valid
- Review ZeroGPU API logs
Issue: FAISS-GPU Not Available
Check:
- GPU is available in HF Spaces
- faiss-gpu package installed correctly
Solution:
- System will automatically fall back to CPU
- Check logs for:
"FAISS GPU not available, using CPU"
Issue: Local Models Not Loading
Check:
use_local_models=Truein code- Transformers/torch available
- GPU memory sufficient
Solution:
- Check logs for initialization errors
- Verify GPU availability
- Models will only load if ZeroGPU fails
π Expected Resource Usage
With ZeroGPU API Enabled (Optimal)
- GPU Memory: ~0-500MB (FAISS-GPU only, no local models)
- CPU: Low (API calls only)
- RAM: ~2-4GB (application + caching)
With ZeroGPU Failing (Fallback Active)
- GPU Memory: ~15GB (local models loaded)
- CPU: Medium (model inference)
- RAM: ~4-6GB (models + application)
FAISS-GPU Usage
- GPU Memory: ~100-500MB (depending on index size)
- CPU Fallback: Automatic if GPU unavailable
β Deployment Complete
Once all checks pass:
- β Application is live
- β ZeroGPU integration working
- β FAISS-GPU accelerated
- β Fallback chain operational
- β Monitoring in place
Next Steps:
- Monitor usage statistics
- Review ZeroGPU API logs
- Optimize based on usage patterns
- Scale as needed
Last Updated: 2025-01-07
Deployment Status: Ready
Version: With ZeroGPU Integration + FAISS-GPU + Lazy Loading