# Compatibility Notes ## Critical Version Constraints ### Python - **Python 3.9-3.11**: HF Spaces typically supports these versions - Avoid Python 3.12+ for maximum compatibility ### PyTorch - **PyTorch 2.1.x**: Latest stable with good HF ecosystem support - CPU-only builds for ZeroGPU deployments ### Transformers - **Transformers 4.35.x**: Latest features with stability - Ensures compatibility with latest HF models ### Gradio - **Gradio 4.x**: Current major version with mobile optimizations - Required for mobile-responsive interface ## HF Spaces Specific Considerations ### ZeroGPU Environment - **Limited GPU memory**: CPU-optimized versions are used - All models run on CPU - Use `faiss-cpu` instead of `faiss-gpu` ### Storage Limits - **Limited persistent storage**: Efficient caching is crucial - Session data must be optimized for minimal storage usage - Implement aggressive cleanup policies ### Network Restrictions - **May have restrictions on external API calls** - All LLM calls must use Hugging Face Inference API - Avoid external HTTP requests in production ## Model Selection ### For ZeroGPU - **Embedding model**: `sentence-transformers/all-MiniLM-L6-v2` (384d, fast) - **Primary LLM**: Use HF Inference API endpoint calls - **Avoid local model loading** for large models ### Memory Optimization - Limit concurrent requests - Use streaming responses - Implement response compression ## Performance Considerations ### Cache Strategy - In-memory caching for active sessions - Aggressive cache eviction (LRU) - TTL-based expiration ### Mobile Optimization - Reduced max tokens for mobile (800 vs 2000) - Shorter timeout (15s vs 30s) - Lazy loading of UI components ## Dependencies Compatibility Matrix | Package | Version Range | Notes | |---------|---------------|-------| | Python | 3.9-3.11 | HF Spaces supported versions | | PyTorch | 2.1.x | CPU version | | Transformers | 4.35.x | Latest stable | | Gradio | 4.x | Mobile support | | FAISS | CPU-only | No GPU support | | NumPy | 1.24.x | Compatibility layer | ## Known Issues & Workarounds ### Issue: FAISS GPU Not Available **Solution**: Use `faiss-cpu` in requirements.txt ### Issue: Model Loading Memory **Solution**: Use HF Inference API instead of local loading ### Issue: Session Storage Limits **Solution**: Implement data compression and TTL-based cleanup ### Issue: Concurrent Request Limits **Solution**: Implement request queue with max_workers limit ## Testing Recommendations 1. Test on ZeroGPU environment before production 2. Verify memory usage stays under 512MB 3. Test mobile responsiveness 4. Validate cache efficiency (target: >60% hit rate)