JatsTheAIGen commited on
Commit
291e38e
·
1 Parent(s): a58b1f9

feat: Add ZeroGPU per-user mode (Option B: Multi-tenant)

Browse files

- Add ZeroGPUUserManager for per-user account management
- Implement automatic user registration and approval
- Add user mapping database table for local-to-API user mapping
- Update LLM router to support both service account and per-user modes
- Add per-user mode configuration options
- Update ZeroGPU client to accept pre-existing tokens
- Add comprehensive documentation for per-user mode

Per-user mode features:
- Automatic user registration on first use
- Per-user usage tracking and statistics
- Per-user rate limits
- Better audit trail per user
- Multi-tenant support

Configuration:
- Set ZERO_GPU_PER_USER_MODE=true to enable
- Requires ZERO_GPU_ADMIN_EMAIL and ZERO_GPU_ADMIN_PASSWORD
- Falls back to service account mode if per-user fails

Database:
- Creates zero_gpu_user_mapping table
- Stores user credentials, tokens, and mapping
- Automatic token refresh and management

ZEROGPU_PER_USER_MODE.md ADDED
@@ -0,0 +1,238 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # ZeroGPU Per-User Mode (Option B: Multi-tenant)
2
+
3
+ ## Overview
4
+
5
+ Per-user mode creates a separate ZeroGPU API account for each application user, providing:
6
+ - ✅ **Per-user usage tracking** - Track usage statistics per user
7
+ - ✅ **Per-user rate limits** - Individual rate limits per user
8
+ - ✅ **Better audit trail** - Each user's requests logged separately
9
+ - ✅ **Multi-tenant support** - Ideal for multi-tenant applications
10
+
11
+ ## Configuration
12
+
13
+ ### Environment Variables
14
+
15
+ ```bash
16
+ # Enable ZeroGPU API
17
+ USE_ZERO_GPU=true
18
+
19
+ # Enable per-user mode (Option B)
20
+ ZERO_GPU_PER_USER_MODE=true
21
+
22
+ # ZeroGPU API base URL
23
+ ZERO_GPU_API_URL=http://your-pod-ip:8000
24
+
25
+ # Admin credentials (for creating/approving users)
26
+ ZERO_GPU_ADMIN_EMAIL=admin@example.com
27
+ ZERO_GPU_ADMIN_PASSWORD=your-admin-password
28
+
29
+ # Database path (for user mapping storage)
30
+ DB_PATH=sessions.db
31
+ ```
32
+
33
+ ### Service Account Mode (Option A)
34
+
35
+ If `ZERO_GPU_PER_USER_MODE=false` or not set, the system uses service account mode:
36
+
37
+ ```bash
38
+ USE_ZERO_GPU=true
39
+ ZERO_GPU_PER_USER_MODE=false # or omit
40
+ ZERO_GPU_API_URL=http://your-pod-ip:8000
41
+ ZERO_GPU_EMAIL=service@example.com
42
+ ZERO_GPU_PASSWORD=your-password
43
+ ```
44
+
45
+ ## How It Works
46
+
47
+ ### User Registration Flow
48
+
49
+ 1. **First Request**: When a user makes their first request:
50
+ - System checks if ZeroGPU account exists for this user
51
+ - If not, automatically registers new user with ZeroGPU API
52
+ - Generates unique email: `user_{hash}@zerogpu.local`
53
+ - Generates secure random password
54
+ - Auto-approves user via admin API
55
+ - Stores mapping in local database
56
+
57
+ 2. **Subsequent Requests**:
58
+ - System retrieves user's ZeroGPU client from cache or database
59
+ - Uses user's tokens for API calls
60
+ - Automatically refreshes tokens when needed
61
+
62
+ ### Database Schema
63
+
64
+ The system creates a `zero_gpu_user_mapping` table:
65
+
66
+ ```sql
67
+ CREATE TABLE zero_gpu_user_mapping (
68
+ local_user_id TEXT PRIMARY KEY,
69
+ api_user_id INTEGER,
70
+ api_email TEXT UNIQUE,
71
+ api_password_hash TEXT,
72
+ access_token TEXT,
73
+ refresh_token TEXT,
74
+ token_expires_at TIMESTAMP,
75
+ is_approved INTEGER DEFAULT 0,
76
+ created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
77
+ last_used TIMESTAMP,
78
+ usage_stats_cache TEXT
79
+ );
80
+ ```
81
+
82
+ ### User Mapping
83
+
84
+ - **Local User ID**: Your application's user identifier (e.g., "Admin_J", "User123")
85
+ - **API User ID**: ZeroGPU API's internal user ID
86
+ - **API Email**: Generated email for ZeroGPU account
87
+ - **Tokens**: Stored for authentication (auto-refreshed)
88
+
89
+ ## Usage
90
+
91
+ ### Automatic User Creation
92
+
93
+ Users are automatically created on first use:
94
+
95
+ ```python
96
+ # In your orchestrator or agent code
97
+ result = await llm_router.route_inference(
98
+ task_type="general_reasoning",
99
+ prompt="What is machine learning?",
100
+ user_id="User123" # Pass user_id for per-user mode
101
+ )
102
+ ```
103
+
104
+ ### Getting User Statistics
105
+
106
+ ```python
107
+ from zero_gpu_user_manager import ZeroGPUUserManager
108
+
109
+ # Initialize manager (usually done in LLMRouter)
110
+ user_manager = ZeroGPUUserManager(
111
+ base_url="http://your-pod-ip:8000",
112
+ admin_email="admin@example.com",
113
+ admin_password="password",
114
+ db_path="sessions.db"
115
+ )
116
+
117
+ # Get usage stats for a user
118
+ stats = user_manager.get_user_stats("User123")
119
+ # Returns: {
120
+ # "user_id": 1,
121
+ # "total_requests": 150,
122
+ # "total_tokens": 45000,
123
+ # "requests_by_task": {...},
124
+ # ...
125
+ # }
126
+ ```
127
+
128
+ ## Integration Points
129
+
130
+ ### LLM Router
131
+
132
+ The LLM router automatically handles per-user clients:
133
+
134
+ ```python
135
+ # In src/llm_router.py
136
+ async def route_inference(self, task_type: str, prompt: str,
137
+ context: Optional[List[Dict]] = None,
138
+ user_id: Optional[str] = None, **kwargs):
139
+ # ...
140
+ # Automatically gets or creates user client if per-user mode enabled
141
+ if self.zero_gpu_mode == "per_user" and user_id:
142
+ client = await self.zero_gpu_user_manager.get_or_create_user_client(user_id)
143
+ # ...
144
+ ```
145
+
146
+ ### Orchestrator Integration
147
+
148
+ Update orchestrator to pass user_id:
149
+
150
+ ```python
151
+ # In orchestrator_engine.py
152
+ result = await self.llm_router.route_inference(
153
+ task_type="general_reasoning",
154
+ prompt=prompt,
155
+ user_id=self.current_user_id # Pass user_id
156
+ )
157
+ ```
158
+
159
+ ## Advantages
160
+
161
+ 1. **Per-User Tracking**: Each user's usage tracked separately
162
+ 2. **Rate Limiting**: Per-user rate limits (60/min, 1000/hour, 10000/day)
163
+ 3. **Audit Trail**: Complete audit trail per user
164
+ 4. **Multi-Tenant**: Ideal for SaaS applications
165
+ 5. **Usage Analytics**: Per-user usage statistics available
166
+
167
+ ## Considerations
168
+
169
+ 1. **User Management Overhead**: Each user requires a ZeroGPU account
170
+ 2. **Token Storage**: Need to securely store user tokens
171
+ 3. **Password Management**: Generated passwords stored as hashes
172
+ 4. **Approval Workflow**: Users auto-approved via admin API
173
+
174
+ ## Security
175
+
176
+ - **Password Storage**: Passwords stored as SHA-256 hashes
177
+ - **Token Management**: Tokens auto-refreshed, stored securely
178
+ - **Email Generation**: Deterministic but unique emails per user
179
+ - **Admin Access**: Admin credentials required for user approval
180
+
181
+ ## Migration from Service Account
182
+
183
+ To migrate from service account (Option A) to per-user (Option B):
184
+
185
+ 1. Set `ZERO_GPU_PER_USER_MODE=true`
186
+ 2. Set `ZERO_GPU_ADMIN_EMAIL` and `ZERO_GPU_ADMIN_PASSWORD`
187
+ 3. Restart application
188
+ 4. Users will be automatically created on first use
189
+
190
+ ## Troubleshooting
191
+
192
+ ### User Not Created
193
+
194
+ - Check admin credentials are correct
195
+ - Verify ZeroGPU API is accessible
196
+ - Check database permissions
197
+ - Review logs for registration errors
198
+
199
+ ### Token Refresh Issues
200
+
201
+ - Tokens auto-refresh on expiry
202
+ - If refresh fails, user will be re-authenticated
203
+ - Check ZeroGPU API availability
204
+
205
+ ### Performance
206
+
207
+ - User clients are cached in memory
208
+ - Database lookups are fast (indexed)
209
+ - First request per user may be slower (registration)
210
+
211
+ ## Example Configuration
212
+
213
+ ```python
214
+ # config.py
215
+ zero_gpu_config = {
216
+ "enabled": True,
217
+ "base_url": "http://your-pod-ip:8000",
218
+ "per_user_mode": True, # Enable per-user mode
219
+ "admin_email": "admin@example.com",
220
+ "admin_password": "secure-password",
221
+ "db_path": "sessions.db"
222
+ }
223
+ ```
224
+
225
+ ## API Endpoints Used
226
+
227
+ - `POST /register` - Register new user
228
+ - `POST /login` - Login and get tokens
229
+ - `POST /admin/approve-user` - Approve user (admin)
230
+ - `GET /usage/stats` - Get usage statistics
231
+ - `POST /chat` - Make inference request
232
+
233
+ ---
234
+
235
+ **Status**: ✅ Implemented
236
+ **Mode**: Per-User Accounts (Multi-tenant)
237
+ **Fallback**: Service Account mode if per-user fails
238
+
app.py CHANGED
@@ -2028,14 +2028,36 @@ def initialize_orchestrator():
2028
  zero_gpu_config = None
2029
  try:
2030
  from config import settings
2031
- if settings.zero_gpu_enabled and settings.zero_gpu_email and settings.zero_gpu_password:
2032
  zero_gpu_config = {
2033
  "enabled": True,
2034
  "base_url": settings.zero_gpu_base_url,
2035
- "email": settings.zero_gpu_email,
2036
- "password": settings.zero_gpu_password
2037
  }
2038
- logger.info("ZeroGPU API enabled in configuration")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2039
  except Exception as e:
2040
  logger.debug(f"Could not load ZeroGPU config: {e}")
2041
 
 
2028
  zero_gpu_config = None
2029
  try:
2030
  from config import settings
2031
+ if settings.zero_gpu_enabled:
2032
  zero_gpu_config = {
2033
  "enabled": True,
2034
  "base_url": settings.zero_gpu_base_url,
2035
+ "per_user_mode": settings.zero_gpu_per_user_mode
 
2036
  }
2037
+
2038
+ if settings.zero_gpu_per_user_mode:
2039
+ # Option B: Per-user accounts (multi-tenant)
2040
+ if settings.zero_gpu_admin_email and settings.zero_gpu_admin_password:
2041
+ zero_gpu_config.update({
2042
+ "admin_email": settings.zero_gpu_admin_email,
2043
+ "admin_password": settings.zero_gpu_admin_password,
2044
+ "db_path": settings.db_path
2045
+ })
2046
+ logger.info("ZeroGPU API enabled in per-user mode (multi-tenant)")
2047
+ else:
2048
+ logger.warning("ZeroGPU per-user mode enabled but admin credentials not provided")
2049
+ zero_gpu_config = None
2050
+ else:
2051
+ # Option A: Service account (single-tenant)
2052
+ if settings.zero_gpu_email and settings.zero_gpu_password:
2053
+ zero_gpu_config.update({
2054
+ "email": settings.zero_gpu_email,
2055
+ "password": settings.zero_gpu_password
2056
+ })
2057
+ logger.info("ZeroGPU API enabled in service account mode")
2058
+ else:
2059
+ logger.warning("ZeroGPU enabled but credentials not provided")
2060
+ zero_gpu_config = None
2061
  except Exception as e:
2062
  logger.debug(f"Could not load ZeroGPU config: {e}")
2063
 
config.py CHANGED
@@ -41,6 +41,10 @@ class Settings(BaseSettings):
41
  zero_gpu_base_url: str = os.getenv("ZERO_GPU_API_URL", "http://localhost:8000")
42
  zero_gpu_email: str = os.getenv("ZERO_GPU_EMAIL", "")
43
  zero_gpu_password: str = os.getenv("ZERO_GPU_PASSWORD", "")
 
 
 
 
44
 
45
  class Config:
46
  env_file = ".env"
 
41
  zero_gpu_base_url: str = os.getenv("ZERO_GPU_API_URL", "http://localhost:8000")
42
  zero_gpu_email: str = os.getenv("ZERO_GPU_EMAIL", "")
43
  zero_gpu_password: str = os.getenv("ZERO_GPU_PASSWORD", "")
44
+ # Per-user mode (Option B: Multi-tenant)
45
+ zero_gpu_per_user_mode: bool = os.getenv("ZERO_GPU_PER_USER_MODE", "false").lower() == "true"
46
+ zero_gpu_admin_email: str = os.getenv("ZERO_GPU_ADMIN_EMAIL", "")
47
+ zero_gpu_admin_password: str = os.getenv("ZERO_GPU_ADMIN_PASSWORD", "")
48
 
49
  class Config:
50
  env_file = ".env"
flask_api_standalone.py CHANGED
@@ -59,14 +59,36 @@ def initialize_orchestrator():
59
  zero_gpu_config = None
60
  try:
61
  from config import settings
62
- if settings.zero_gpu_enabled and settings.zero_gpu_email and settings.zero_gpu_password:
63
  zero_gpu_config = {
64
  "enabled": True,
65
  "base_url": settings.zero_gpu_base_url,
66
- "email": settings.zero_gpu_email,
67
- "password": settings.zero_gpu_password
68
  }
69
- logger.info("ZeroGPU API enabled in configuration")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
70
  except Exception as e:
71
  logger.debug(f"Could not load ZeroGPU config: {e}")
72
 
 
59
  zero_gpu_config = None
60
  try:
61
  from config import settings
62
+ if settings.zero_gpu_enabled:
63
  zero_gpu_config = {
64
  "enabled": True,
65
  "base_url": settings.zero_gpu_base_url,
66
+ "per_user_mode": settings.zero_gpu_per_user_mode
 
67
  }
68
+
69
+ if settings.zero_gpu_per_user_mode:
70
+ # Option B: Per-user accounts (multi-tenant)
71
+ if settings.zero_gpu_admin_email and settings.zero_gpu_admin_password:
72
+ zero_gpu_config.update({
73
+ "admin_email": settings.zero_gpu_admin_email,
74
+ "admin_password": settings.zero_gpu_admin_password,
75
+ "db_path": settings.db_path
76
+ })
77
+ logger.info("ZeroGPU API enabled in per-user mode (multi-tenant)")
78
+ else:
79
+ logger.warning("ZeroGPU per-user mode enabled but admin credentials not provided")
80
+ zero_gpu_config = None
81
+ else:
82
+ # Option A: Service account (single-tenant)
83
+ if settings.zero_gpu_email and settings.zero_gpu_password:
84
+ zero_gpu_config.update({
85
+ "email": settings.zero_gpu_email,
86
+ "password": settings.zero_gpu_password
87
+ })
88
+ logger.info("ZeroGPU API enabled in service account mode")
89
+ else:
90
+ logger.warning("ZeroGPU enabled but credentials not provided")
91
+ zero_gpu_config = None
92
  except Exception as e:
93
  logger.debug(f"Could not load ZeroGPU config: {e}")
94
 
src/llm_router.py CHANGED
@@ -13,8 +13,10 @@ class LLMRouter:
13
  self.health_status = {}
14
  self.use_local_models = use_local_models
15
  self.local_loader = None
16
- self.zero_gpu_client = None
 
17
  self.use_zero_gpu = False
 
18
 
19
  logger.info("LLMRouter initialized")
20
  if hf_token:
@@ -24,32 +26,63 @@ class LLMRouter:
24
 
25
  # Initialize ZeroGPU client if configured
26
  if zero_gpu_config and zero_gpu_config.get("enabled", False):
27
- try:
28
- from zero_gpu_client import ZeroGPUChatClient
29
- base_url = zero_gpu_config.get("base_url", os.getenv("ZERO_GPU_API_URL", "http://localhost:8000"))
30
- email = zero_gpu_config.get("email", os.getenv("ZERO_GPU_EMAIL", ""))
31
- password = zero_gpu_config.get("password", os.getenv("ZERO_GPU_PASSWORD", ""))
32
-
33
- if email and password:
34
- self.zero_gpu_client = ZeroGPUChatClient(base_url, email, password)
35
- self.use_zero_gpu = True
36
- logger.info(" ZeroGPU API client initialized")
 
37
 
38
- # Wait for API to be ready (non-blocking, will fallback if not ready)
39
- try:
40
- if not self.zero_gpu_client.wait_for_ready(timeout=10):
41
- logger.warning("ZeroGPU API not ready, will use HF fallback")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
42
  self.use_zero_gpu = False
43
- except Exception as e:
44
- logger.warning(f"Could not verify ZeroGPU API readiness: {e}. Will use HF fallback.")
45
- self.use_zero_gpu = False
46
- else:
47
- logger.warning("ZeroGPU enabled but credentials not provided")
48
- except ImportError:
49
- logger.warning("zero_gpu_client not available, ZeroGPU disabled")
50
- except Exception as e:
51
- logger.warning(f"Could not initialize ZeroGPU client: {e}. Falling back to HF API.")
52
- self.use_zero_gpu = False
53
 
54
  # Initialize local model loader if enabled
55
  if self.use_local_models:
@@ -67,10 +100,17 @@ class LLMRouter:
67
  self.use_local_models = False
68
  self.local_loader = None
69
 
70
- async def route_inference(self, task_type: str, prompt: str, context: Optional[List[Dict]] = None, **kwargs):
71
  """
72
  Smart routing based on task specialization
73
  Tries local models first, then ZeroGPU API, falls back to HF Inference API if needed
 
 
 
 
 
 
 
74
  """
75
  logger.info(f"Routing inference for task: {task_type}")
76
  model_config = self._select_model(task_type)
@@ -95,9 +135,9 @@ class LLMRouter:
95
  logger.debug("Exception details:", exc_info=True)
96
 
97
  # Try ZeroGPU API if enabled
98
- if self.use_zero_gpu and self.zero_gpu_client:
99
  try:
100
- result = await self._call_zero_gpu_endpoint(task_type, prompt, context, **kwargs)
101
  if result is not None:
102
  logger.info(f"Inference complete for {task_type} (ZeroGPU API)")
103
  return result
@@ -194,7 +234,7 @@ class LLMRouter:
194
  logger.error(f"Error calling local embedding model: {e}", exc_info=True)
195
  return None
196
 
197
- async def _call_zero_gpu_endpoint(self, task_type: str, prompt: str, context: Optional[List[Dict]] = None, **kwargs) -> Optional[str]:
198
  """
199
  Call ZeroGPU API endpoint
200
 
@@ -202,12 +242,25 @@ class LLMRouter:
202
  task_type: Task type (e.g., "intent_classification", "general_reasoning")
203
  prompt: User prompt/message
204
  context: Optional conversation context
 
205
  **kwargs: Additional generation parameters
206
 
207
  Returns:
208
  Generated text response or None if failed
209
  """
210
- if not self.zero_gpu_client:
 
 
 
 
 
 
 
 
 
 
 
 
211
  return None
212
 
213
  try:
@@ -254,7 +307,7 @@ class LLMRouter:
254
  generation_params["system_prompt"] = kwargs['system_prompt']
255
 
256
  # Call ZeroGPU API
257
- response = self.zero_gpu_client.chat(
258
  message=prompt,
259
  task=zero_gpu_task,
260
  context=context_messages,
 
13
  self.health_status = {}
14
  self.use_local_models = use_local_models
15
  self.local_loader = None
16
+ self.zero_gpu_client = None # Service account client (Option A)
17
+ self.zero_gpu_user_manager = None # Per-user manager (Option B)
18
  self.use_zero_gpu = False
19
+ self.zero_gpu_mode = "service_account" # "service_account" or "per_user"
20
 
21
  logger.info("LLMRouter initialized")
22
  if hf_token:
 
26
 
27
  # Initialize ZeroGPU client if configured
28
  if zero_gpu_config and zero_gpu_config.get("enabled", False):
29
+ # Check if per-user mode is enabled
30
+ per_user_mode = zero_gpu_config.get("per_user_mode", False)
31
+
32
+ if per_user_mode:
33
+ # Option B: Per-User Accounts (Multi-tenant)
34
+ try:
35
+ from zero_gpu_user_manager import ZeroGPUUserManager
36
+ base_url = zero_gpu_config.get("base_url", os.getenv("ZERO_GPU_API_URL", "http://localhost:8000"))
37
+ admin_email = zero_gpu_config.get("admin_email", os.getenv("ZERO_GPU_ADMIN_EMAIL", ""))
38
+ admin_password = zero_gpu_config.get("admin_password", os.getenv("ZERO_GPU_ADMIN_PASSWORD", ""))
39
+ db_path = zero_gpu_config.get("db_path", os.getenv("DB_PATH", "sessions.db"))
40
 
41
+ if admin_email and admin_password:
42
+ self.zero_gpu_user_manager = ZeroGPUUserManager(
43
+ base_url, admin_email, admin_password, db_path
44
+ )
45
+ self.use_zero_gpu = True
46
+ self.zero_gpu_mode = "per_user"
47
+ logger.info("✓ ZeroGPU per-user mode enabled (multi-tenant)")
48
+ else:
49
+ logger.warning("ZeroGPU per-user mode enabled but admin credentials not provided")
50
+ except ImportError:
51
+ logger.warning("zero_gpu_user_manager not available, falling back to service account mode")
52
+ per_user_mode = False
53
+ except Exception as e:
54
+ logger.warning(f"Could not initialize ZeroGPU user manager: {e}. Falling back to service account mode.")
55
+ per_user_mode = False
56
+
57
+ if not per_user_mode:
58
+ # Option A: Service Account (Single-tenant)
59
+ try:
60
+ from zero_gpu_client import ZeroGPUChatClient
61
+ base_url = zero_gpu_config.get("base_url", os.getenv("ZERO_GPU_API_URL", "http://localhost:8000"))
62
+ email = zero_gpu_config.get("email", os.getenv("ZERO_GPU_EMAIL", ""))
63
+ password = zero_gpu_config.get("password", os.getenv("ZERO_GPU_PASSWORD", ""))
64
+
65
+ if email and password:
66
+ self.zero_gpu_client = ZeroGPUChatClient(base_url, email, password)
67
+ self.use_zero_gpu = True
68
+ self.zero_gpu_mode = "service_account"
69
+ logger.info("✓ ZeroGPU API client initialized (service account mode)")
70
+
71
+ # Wait for API to be ready (non-blocking, will fallback if not ready)
72
+ try:
73
+ if not self.zero_gpu_client.wait_for_ready(timeout=10):
74
+ logger.warning("ZeroGPU API not ready, will use HF fallback")
75
+ self.use_zero_gpu = False
76
+ except Exception as e:
77
+ logger.warning(f"Could not verify ZeroGPU API readiness: {e}. Will use HF fallback.")
78
  self.use_zero_gpu = False
79
+ else:
80
+ logger.warning("ZeroGPU enabled but credentials not provided")
81
+ except ImportError:
82
+ logger.warning("zero_gpu_client not available, ZeroGPU disabled")
83
+ except Exception as e:
84
+ logger.warning(f"Could not initialize ZeroGPU client: {e}. Falling back to HF API.")
85
+ self.use_zero_gpu = False
 
 
 
86
 
87
  # Initialize local model loader if enabled
88
  if self.use_local_models:
 
100
  self.use_local_models = False
101
  self.local_loader = None
102
 
103
+ async def route_inference(self, task_type: str, prompt: str, context: Optional[List[Dict]] = None, user_id: Optional[str] = None, **kwargs):
104
  """
105
  Smart routing based on task specialization
106
  Tries local models first, then ZeroGPU API, falls back to HF Inference API if needed
107
+
108
+ Args:
109
+ task_type: Task type (e.g., "intent_classification", "general_reasoning")
110
+ prompt: User prompt/message
111
+ context: Optional conversation context
112
+ user_id: Optional user ID for per-user ZeroGPU accounts (Option B)
113
+ **kwargs: Additional generation parameters
114
  """
115
  logger.info(f"Routing inference for task: {task_type}")
116
  model_config = self._select_model(task_type)
 
135
  logger.debug("Exception details:", exc_info=True)
136
 
137
  # Try ZeroGPU API if enabled
138
+ if self.use_zero_gpu:
139
  try:
140
+ result = await self._call_zero_gpu_endpoint(task_type, prompt, context, user_id, **kwargs)
141
  if result is not None:
142
  logger.info(f"Inference complete for {task_type} (ZeroGPU API)")
143
  return result
 
234
  logger.error(f"Error calling local embedding model: {e}", exc_info=True)
235
  return None
236
 
237
+ async def _call_zero_gpu_endpoint(self, task_type: str, prompt: str, context: Optional[List[Dict]] = None, user_id: Optional[str] = None, **kwargs) -> Optional[str]:
238
  """
239
  Call ZeroGPU API endpoint
240
 
 
242
  task_type: Task type (e.g., "intent_classification", "general_reasoning")
243
  prompt: User prompt/message
244
  context: Optional conversation context
245
+ user_id: Optional user ID for per-user accounts (Option B)
246
  **kwargs: Additional generation parameters
247
 
248
  Returns:
249
  Generated text response or None if failed
250
  """
251
+ # Get appropriate client based on mode
252
+ client = None
253
+ if self.zero_gpu_mode == "per_user" and self.zero_gpu_user_manager and user_id:
254
+ # Option B: Per-user accounts
255
+ client = await self.zero_gpu_user_manager.get_or_create_user_client(user_id)
256
+ if not client:
257
+ logger.warning(f"Could not get ZeroGPU client for user {user_id}, falling back to service account")
258
+ client = self.zero_gpu_client
259
+ else:
260
+ # Option A: Service account
261
+ client = self.zero_gpu_client
262
+
263
+ if not client:
264
  return None
265
 
266
  try:
 
307
  generation_params["system_prompt"] = kwargs['system_prompt']
308
 
309
  # Call ZeroGPU API
310
+ response = client.chat(
311
  message=prompt,
312
  task=zero_gpu_task,
313
  context=context_messages,
zero_gpu_client.py CHANGED
@@ -15,7 +15,7 @@ logger = logging.getLogger(__name__)
15
  class ZeroGPUChatClient:
16
  """Client for ZeroGPU Chat API with automatic token refresh"""
17
 
18
- def __init__(self, base_url: str, email: str, password: str):
19
  """
20
  Initialize ZeroGPU API client
21
 
@@ -23,16 +23,24 @@ class ZeroGPUChatClient:
23
  base_url: Base URL of ZeroGPU API (e.g., "http://your-pod-ip:8000")
24
  email: User email for authentication
25
  password: User password for authentication
 
 
26
  """
27
  self.base_url = base_url.rstrip('/')
28
  self.email = email
29
  self.password = password
30
- self.access_token = None
31
- self.refresh_token = None
32
  self._last_token_refresh = None
33
 
34
  logger.info(f"Initializing ZeroGPU client for {self.base_url}")
35
- self.login(email, password)
 
 
 
 
 
 
36
 
37
  def login(self, email: str, password: str):
38
  """Login and get authentication tokens"""
 
15
  class ZeroGPUChatClient:
16
  """Client for ZeroGPU Chat API with automatic token refresh"""
17
 
18
+ def __init__(self, base_url: str, email: str, password: str, access_token: str = None, refresh_token: str = None):
19
  """
20
  Initialize ZeroGPU API client
21
 
 
23
  base_url: Base URL of ZeroGPU API (e.g., "http://your-pod-ip:8000")
24
  email: User email for authentication
25
  password: User password for authentication
26
+ access_token: Optional pre-existing access token
27
+ refresh_token: Optional pre-existing refresh token
28
  """
29
  self.base_url = base_url.rstrip('/')
30
  self.email = email
31
  self.password = password
32
+ self.access_token = access_token
33
+ self.refresh_token = refresh_token
34
  self._last_token_refresh = None
35
 
36
  logger.info(f"Initializing ZeroGPU client for {self.base_url}")
37
+
38
+ # If tokens provided, use them; otherwise login
39
+ if access_token and refresh_token:
40
+ self._last_token_refresh = time.time()
41
+ logger.info("Using provided tokens")
42
+ else:
43
+ self.login(email, password)
44
 
45
  def login(self, email: str, password: str):
46
  """Login and get authentication tokens"""
zero_gpu_user_manager.py ADDED
@@ -0,0 +1,411 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # zero_gpu_user_manager.py
2
+ """
3
+ ZeroGPU User Management - Per-User Accounts (Multi-tenant)
4
+ Manages mapping between local users and ZeroGPU API user accounts
5
+ """
6
+ import logging
7
+ import sqlite3
8
+ import hashlib
9
+ import secrets
10
+ from typing import Optional, Dict, Any
11
+ from datetime import datetime
12
+ from zero_gpu_client import ZeroGPUChatClient
13
+
14
+ logger = logging.getLogger(__name__)
15
+
16
+
17
+ class ZeroGPUUserManager:
18
+ """Manages per-user ZeroGPU API accounts with automatic registration and token management"""
19
+
20
+ def __init__(self, base_url: str, admin_email: str, admin_password: str, db_path: str = "sessions.db"):
21
+ """
22
+ Initialize user manager
23
+
24
+ Args:
25
+ base_url: ZeroGPU API base URL
26
+ admin_email: Admin email for creating/approving users
27
+ admin_password: Admin password
28
+ db_path: Path to database for user mapping storage
29
+ """
30
+ self.base_url = base_url
31
+ self.admin_email = admin_email
32
+ self.admin_password = admin_password
33
+ self.db_path = db_path
34
+ self.admin_client = None
35
+ self.user_clients = {} # Cache of ZeroGPU clients per user
36
+
37
+ # Initialize admin client for user management operations
38
+ try:
39
+ self.admin_client = ZeroGPUChatClient(base_url, admin_email, admin_password)
40
+ logger.info("✓ ZeroGPU admin client initialized")
41
+ except Exception as e:
42
+ logger.error(f"Failed to initialize admin client: {e}")
43
+ raise
44
+
45
+ # Initialize database
46
+ self._init_database()
47
+
48
+ def _init_database(self):
49
+ """Initialize database tables for user mapping"""
50
+ try:
51
+ conn = sqlite3.connect(self.db_path)
52
+ cursor = conn.cursor()
53
+
54
+ # Create user mapping table
55
+ cursor.execute("""
56
+ CREATE TABLE IF NOT EXISTS zero_gpu_user_mapping (
57
+ local_user_id TEXT PRIMARY KEY,
58
+ api_user_id INTEGER,
59
+ api_email TEXT UNIQUE,
60
+ api_password_hash TEXT,
61
+ access_token TEXT,
62
+ refresh_token TEXT,
63
+ token_expires_at TIMESTAMP,
64
+ is_approved INTEGER DEFAULT 0,
65
+ created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
66
+ last_used TIMESTAMP,
67
+ usage_stats_cache TEXT
68
+ )
69
+ """)
70
+
71
+ # Create index for faster lookups
72
+ cursor.execute("""
73
+ CREATE INDEX IF NOT EXISTS idx_zero_gpu_email
74
+ ON zero_gpu_user_mapping(api_email)
75
+ """)
76
+
77
+ conn.commit()
78
+ conn.close()
79
+ logger.info("✓ ZeroGPU user mapping database initialized")
80
+ except Exception as e:
81
+ logger.error(f"Failed to initialize user mapping database: {e}")
82
+ raise
83
+
84
+ def _generate_user_credentials(self, local_user_id: str) -> Dict[str, str]:
85
+ """
86
+ Generate ZeroGPU API credentials for a local user
87
+
88
+ Args:
89
+ local_user_id: Local application user ID
90
+
91
+ Returns:
92
+ Dictionary with email, password, and password hash
93
+ """
94
+ # Generate deterministic but unique email based on local user ID
95
+ # Format: user_{hash}@zerogpu.local
96
+ user_hash = hashlib.sha256(local_user_id.encode()).hexdigest()[:16]
97
+ email = f"user_{user_hash}@zerogpu.local"
98
+
99
+ # Generate secure random password
100
+ password = secrets.token_urlsafe(32)
101
+ password_hash = hashlib.sha256(password.encode()).hexdigest()
102
+
103
+ return {
104
+ "email": email,
105
+ "password": password,
106
+ "password_hash": password_hash
107
+ }
108
+
109
+ def _get_user_mapping(self, local_user_id: str) -> Optional[Dict[str, Any]]:
110
+ """Get user mapping from database"""
111
+ try:
112
+ conn = sqlite3.connect(self.db_path)
113
+ cursor = conn.cursor()
114
+ cursor.execute("""
115
+ SELECT local_user_id, api_user_id, api_email, api_password_hash,
116
+ access_token, refresh_token, token_expires_at, is_approved,
117
+ last_used, usage_stats_cache
118
+ FROM zero_gpu_user_mapping
119
+ WHERE local_user_id = ?
120
+ """, (local_user_id,))
121
+
122
+ row = cursor.fetchone()
123
+ conn.close()
124
+
125
+ if row:
126
+ return {
127
+ "local_user_id": row[0],
128
+ "api_user_id": row[1],
129
+ "api_email": row[2],
130
+ "api_password_hash": row[3],
131
+ "access_token": row[4],
132
+ "refresh_token": row[5],
133
+ "token_expires_at": row[6],
134
+ "is_approved": bool(row[7]),
135
+ "last_used": row[8],
136
+ "usage_stats_cache": row[9]
137
+ }
138
+ return None
139
+ except Exception as e:
140
+ logger.error(f"Error getting user mapping: {e}")
141
+ return None
142
+
143
+ def _save_user_mapping(self, local_user_id: str, api_user_id: int, api_email: str,
144
+ password_hash: str, access_token: str = None,
145
+ refresh_token: str = None):
146
+ """Save user mapping to database"""
147
+ try:
148
+ conn = sqlite3.connect(self.db_path)
149
+ cursor = conn.cursor()
150
+ cursor.execute("""
151
+ INSERT OR REPLACE INTO zero_gpu_user_mapping
152
+ (local_user_id, api_user_id, api_email, api_password_hash,
153
+ access_token, refresh_token, token_expires_at, last_used)
154
+ VALUES (?, ?, ?, ?, ?, ?, datetime('now', '+15 minutes'), datetime('now'))
155
+ """, (local_user_id, api_user_id, api_email, password_hash,
156
+ access_token, refresh_token))
157
+ conn.commit()
158
+ conn.close()
159
+ except Exception as e:
160
+ logger.error(f"Error saving user mapping: {e}")
161
+
162
+ def _update_user_tokens(self, local_user_id: str, access_token: str, refresh_token: str):
163
+ """Update user tokens in database"""
164
+ try:
165
+ conn = sqlite3.connect(self.db_path)
166
+ cursor = conn.cursor()
167
+ cursor.execute("""
168
+ UPDATE zero_gpu_user_mapping
169
+ SET access_token = ?, refresh_token = ?,
170
+ token_expires_at = datetime('now', '+15 minutes'),
171
+ last_used = datetime('now')
172
+ WHERE local_user_id = ?
173
+ """, (access_token, refresh_token, local_user_id))
174
+ conn.commit()
175
+ conn.close()
176
+ except Exception as e:
177
+ logger.error(f"Error updating user tokens: {e}")
178
+
179
+ def _update_approval_status(self, local_user_id: str, is_approved: bool):
180
+ """Update user approval status"""
181
+ try:
182
+ conn = sqlite3.connect(self.db_path)
183
+ cursor = conn.cursor()
184
+ cursor.execute("""
185
+ UPDATE zero_gpu_user_mapping
186
+ SET is_approved = ?
187
+ WHERE local_user_id = ?
188
+ """, (1 if is_approved else 0, local_user_id))
189
+ conn.commit()
190
+ conn.close()
191
+ except Exception as e:
192
+ logger.error(f"Error updating approval status: {e}")
193
+
194
+ async def get_or_create_user_client(self, local_user_id: str) -> Optional[ZeroGPUChatClient]:
195
+ """
196
+ Get or create ZeroGPU client for a local user
197
+
198
+ Args:
199
+ local_user_id: Local application user ID
200
+
201
+ Returns:
202
+ ZeroGPUChatClient instance or None if failed
203
+ """
204
+ # Check cache first
205
+ if local_user_id in self.user_clients:
206
+ client = self.user_clients[local_user_id]
207
+ # Verify client is still valid
208
+ if client.health_check():
209
+ return client
210
+ else:
211
+ # Remove invalid client from cache
212
+ del self.user_clients[local_user_id]
213
+
214
+ # Get user mapping
215
+ mapping = self._get_user_mapping(local_user_id)
216
+
217
+ if mapping:
218
+ # User exists, try to create client
219
+ # Note: We store password hash, but need password for login
220
+ # Solution: Store password encrypted or regenerate deterministically
221
+ try:
222
+ # For now, we'll need to regenerate the password deterministically
223
+ # This is acceptable since we control the generation
224
+ creds = self._generate_user_credentials(local_user_id)
225
+
226
+ # Verify the hash matches (security check)
227
+ if creds["password_hash"] != mapping["api_password_hash"]:
228
+ logger.error(f"Password hash mismatch for user {local_user_id}")
229
+ return None
230
+
231
+ # Create client with regenerated password
232
+ client = ZeroGPUChatClient(
233
+ self.base_url,
234
+ mapping["api_email"],
235
+ creds["password"], # Regenerated deterministically
236
+ mapping.get("access_token"),
237
+ mapping.get("refresh_token")
238
+ )
239
+
240
+ # Cache client
241
+ self.user_clients[local_user_id] = client
242
+ return client
243
+ except Exception as e:
244
+ logger.error(f"Error creating client for existing user: {e}")
245
+ return None
246
+ else:
247
+ # New user - register with ZeroGPU API
248
+ return await self._register_new_user(local_user_id)
249
+
250
+ async def _register_new_user(self, local_user_id: str) -> Optional[ZeroGPUChatClient]:
251
+ """
252
+ Register a new user with ZeroGPU API
253
+
254
+ Args:
255
+ local_user_id: Local application user ID
256
+
257
+ Returns:
258
+ ZeroGPUChatClient instance or None if failed
259
+ """
260
+ try:
261
+ # Generate credentials
262
+ creds = self._generate_user_credentials(local_user_id)
263
+
264
+ # Register user with ZeroGPU API
265
+ import requests
266
+ response = requests.post(
267
+ f"{self.base_url}/register",
268
+ json={
269
+ "full_name": f"User {local_user_id}",
270
+ "email": creds["email"],
271
+ "mobile": f"+1{hash(local_user_id) % 10000000000:010d}", # Generate fake mobile
272
+ "password": creds["password"]
273
+ },
274
+ timeout=10
275
+ )
276
+
277
+ if response.status_code == 200:
278
+ user_data = response.json()
279
+ api_user_id = user_data["id"]
280
+ is_approved = user_data.get("is_approved", False)
281
+
282
+ # If not auto-approved, approve via admin endpoint
283
+ if not is_approved and self.admin_client:
284
+ try:
285
+ # Approve user via admin API
286
+ admin_response = requests.post(
287
+ f"{self.base_url}/admin/approve-user",
288
+ headers={"Authorization": f"Bearer {self.admin_client.access_token}"},
289
+ json={
290
+ "user_id": api_user_id,
291
+ "approve": True,
292
+ "notes": f"Auto-approved for local user {local_user_id}"
293
+ },
294
+ timeout=10
295
+ )
296
+ if admin_response.status_code == 200:
297
+ is_approved = True
298
+ logger.info(f"Auto-approved ZeroGPU user {api_user_id} for local user {local_user_id}")
299
+ except Exception as e:
300
+ logger.warning(f"Could not auto-approve user: {e}")
301
+
302
+ # Login to get tokens
303
+ login_response = requests.post(
304
+ f"{self.base_url}/login",
305
+ json={
306
+ "email": creds["email"],
307
+ "password": creds["password"]
308
+ },
309
+ timeout=10
310
+ )
311
+
312
+ if login_response.status_code == 200:
313
+ login_data = login_response.json()
314
+
315
+ # Create client
316
+ client = ZeroGPUChatClient(
317
+ self.base_url,
318
+ creds["email"],
319
+ creds["password"]
320
+ )
321
+
322
+ # Save mapping (store password hash, not plain password)
323
+ self._save_user_mapping(
324
+ local_user_id,
325
+ api_user_id,
326
+ creds["email"],
327
+ creds["password_hash"],
328
+ login_data["access_token"],
329
+ login_data["refresh_token"]
330
+ )
331
+
332
+ # Cache client
333
+ self.user_clients[local_user_id] = client
334
+
335
+ logger.info(f"✓ Registered and logged in ZeroGPU user for local user: {local_user_id}")
336
+ return client
337
+ else:
338
+ logger.error(f"Failed to login after registration: {login_response.text}")
339
+ return None
340
+ else:
341
+ # User might already exist, try to login
342
+ if response.status_code == 400:
343
+ logger.info(f"User {creds['email']} may already exist, attempting login...")
344
+ login_response = requests.post(
345
+ f"{self.base_url}/login",
346
+ json={
347
+ "email": creds["email"],
348
+ "password": creds["password"]
349
+ },
350
+ timeout=10
351
+ )
352
+
353
+ if login_response.status_code == 200:
354
+ login_data = login_response.json()
355
+ user_info = requests.get(
356
+ f"{self.base_url}/me",
357
+ headers={"Authorization": f"Bearer {login_data['access_token']}"},
358
+ timeout=10
359
+ )
360
+
361
+ if user_info.status_code == 200:
362
+ user_data = user_info.json()
363
+ client = ZeroGPUChatClient(
364
+ self.base_url,
365
+ creds["email"],
366
+ creds["password"]
367
+ )
368
+
369
+ self._save_user_mapping(
370
+ local_user_id,
371
+ user_data["id"],
372
+ creds["email"],
373
+ creds["password_hash"],
374
+ login_data["access_token"],
375
+ login_data["refresh_token"]
376
+ )
377
+
378
+ self.user_clients[local_user_id] = client
379
+ return client
380
+
381
+ logger.error(f"Failed to register user: {response.text}")
382
+ return None
383
+
384
+ except Exception as e:
385
+ logger.error(f"Error registering new user: {e}", exc_info=True)
386
+ return None
387
+
388
+ def get_user_stats(self, local_user_id: str) -> Optional[Dict[str, Any]]:
389
+ """Get usage statistics for a user"""
390
+ mapping = self._get_user_mapping(local_user_id)
391
+ if not mapping or not mapping.get("api_user_id"):
392
+ return None
393
+
394
+ # Get client
395
+ client = self.user_clients.get(local_user_id)
396
+ if not client:
397
+ return None
398
+
399
+ try:
400
+ stats = client.get_usage_stats(days=30)
401
+ return stats
402
+ except Exception as e:
403
+ logger.error(f"Error getting user stats: {e}")
404
+ return None
405
+
406
+ def cleanup_expired_clients(self):
407
+ """Remove expired clients from cache"""
408
+ # Simple cleanup - remove clients that haven't been used recently
409
+ # In production, implement more sophisticated cache management
410
+ pass
411
+