| # Compatibility Notes | |
| ## Critical Version Constraints | |
| ### Python | |
| - **Python 3.9-3.11**: HF Spaces typically supports these versions | |
| - Avoid Python 3.12+ for maximum compatibility | |
| ### PyTorch | |
| - **PyTorch 2.1.x**: Latest stable with good HF ecosystem support | |
| - CPU-only builds for ZeroGPU deployments | |
| ### Transformers | |
| - **Transformers 4.35.x**: Latest features with stability | |
| - Ensures compatibility with latest HF models | |
| ### Gradio | |
| - **Gradio 4.x**: Current major version with mobile optimizations | |
| - Required for mobile-responsive interface | |
| ## HF Spaces Specific Considerations | |
| ### ZeroGPU Environment | |
| - **Limited GPU memory**: CPU-optimized versions are used | |
| - All models run on CPU | |
| - Use `faiss-cpu` instead of `faiss-gpu` | |
| ### Storage Limits | |
| - **Limited persistent storage**: Efficient caching is crucial | |
| - Session data must be optimized for minimal storage usage | |
| - Implement aggressive cleanup policies | |
| ### Network Restrictions | |
| - **May have restrictions on external API calls** | |
| - All LLM calls must use Hugging Face Inference API | |
| - Avoid external HTTP requests in production | |
| ## Model Selection | |
| ### For ZeroGPU | |
| - **Embedding model**: `sentence-transformers/all-MiniLM-L6-v2` (384d, fast) | |
| - **Primary LLM**: Use HF Inference API endpoint calls | |
| - **Avoid local model loading** for large models | |
| ### Memory Optimization | |
| - Limit concurrent requests | |
| - Use streaming responses | |
| - Implement response compression | |
| ## Performance Considerations | |
| ### Cache Strategy | |
| - In-memory caching for active sessions | |
| - Aggressive cache eviction (LRU) | |
| - TTL-based expiration | |
| ### Mobile Optimization | |
| - Reduced max tokens for mobile (800 vs 2000) | |
| - Shorter timeout (15s vs 30s) | |
| - Lazy loading of UI components | |
| ## Dependencies Compatibility Matrix | |
| | Package | Version Range | Notes | | |
| |---------|---------------|-------| | |
| | Python | 3.9-3.11 | HF Spaces supported versions | | |
| | PyTorch | 2.1.x | CPU version | | |
| | Transformers | 4.35.x | Latest stable | | |
| | Gradio | 4.x | Mobile support | | |
| | FAISS | CPU-only | No GPU support | | |
| | NumPy | 1.24.x | Compatibility layer | | |
| ## Known Issues & Workarounds | |
| ### Issue: FAISS GPU Not Available | |
| **Solution**: Use `faiss-cpu` in requirements.txt | |
| ### Issue: Model Loading Memory | |
| **Solution**: Use HF Inference API instead of local loading | |
| ### Issue: Session Storage Limits | |
| **Solution**: Implement data compression and TTL-based cleanup | |
| ### Issue: Concurrent Request Limits | |
| **Solution**: Implement request queue with max_workers limit | |
| ## Testing Recommendations | |
| 1. Test on ZeroGPU environment before production | |
| 2. Verify memory usage stays under 512MB | |
| 3. Test mobile responsiveness | |
| 4. Validate cache efficiency (target: >60% hit rate) | |