Flask API Documentation
Overview
The Research AI Assistant API provides a RESTful interface for interacting with an AI-powered research assistant. The API uses local GPU models for inference and supports conversational interactions with context management.
Base URL (HF Spaces): https://jatinautonomouslabs-research-ai-assistant-api.hf.space
Alternative Base URL: https://huggingface.co/spaces/JatinAutonomousLabs/Research_AI_Assistant_API
API Version: 1.0
Content-Type: application/json
Note: For Hugging Face Spaces Docker deployments, use the
.hf.spacedomain format. The space name is converted to lowercase with hyphens.
Features
- 🤖 AI-Powered Responses - Local GPU model inference (Tesla T4)
- 💬 Conversational Context - Maintains conversation history and user context
- 🔒 CORS Enabled - Ready for web integration
- ⚡ Async Processing - Efficient request handling
- 📊 Transparent Reasoning - Returns reasoning chains and performance metrics
Authentication
Currently, the API does not require authentication. However, for production use, you should:
- Set
HF_TOKENenvironment variable for Hugging Face model access - Implement API key authentication if needed
Endpoints
1. Get API Information
Endpoint: GET /
Description: Returns API information, version, and available endpoints.
Request:
GET / HTTP/1.1
Host: huggingface.co
Response:
{
"name": "AI Assistant Flask API",
"version": "1.0",
"status": "running",
"orchestrator_ready": true,
"features": {
"local_gpu_models": true,
"max_workers": 4,
"hardware": "NVIDIA T4 Medium"
},
"endpoints": {
"health": "GET /api/health",
"chat": "POST /api/chat",
"initialize": "POST /api/initialize",
"context_mode_get": "GET /api/context/mode",
"context_mode_set": "POST /api/context/mode"
}
}
Status Codes:
200 OK- Success
2. Health Check
Endpoint: GET /api/health
Description: Checks if the API and orchestrator are ready to handle requests.
Request:
GET /api/health HTTP/1.1
Host: huggingface.co
Response:
{
"status": "healthy",
"orchestrator_ready": true
}
Status Codes:
200 OK- API is healthyorchestrator_ready: true- Ready to process requestsorchestrator_ready: false- Still initializing
Example Response (Initializing):
{
"status": "initializing",
"orchestrator_ready": false
}
3. Chat Endpoint
Endpoint: POST /api/chat
Description: Send a message to the AI assistant and receive a response with reasoning and context.
Request Headers:
Content-Type: application/json
Request Body:
{
"message": "Explain quantum entanglement in simple terms",
"history": [
["User message 1", "Assistant response 1"],
["User message 2", "Assistant response 2"]
],
"session_id": "session-123",
"user_id": "user-456"
}
Request Fields:
| Field | Type | Required | Description |
|---|---|---|---|
message |
string | ✅ Yes | User's message/question (max 10,000 characters) |
history |
array | ❌ No | Conversation history as array of [user, assistant] pairs |
session_id |
string | ❌ No | Unique session identifier for context continuity |
user_id |
string | ❌ No | User identifier (defaults to "anonymous") |
context_mode |
string | ❌ No | Context retrieval mode: "fresh" (no user context) or "relevant" (only relevant context). Defaults to "fresh" if not set. |
Response (Success):
{
"success": true,
"message": "Quantum entanglement is when two particles become linked...",
"history": [
["Explain quantum entanglement", "Quantum entanglement is when two particles become linked..."]
],
"reasoning": {
"intent": "educational_query",
"steps": ["Understanding request", "Gathering information", "Synthesizing response"],
"confidence": 0.95
},
"performance": {
"response_time_ms": 2345,
"tokens_generated": 156,
"model_used": "mistralai/Mistral-7B-Instruct-v0.2"
}
}
Response Fields:
| Field | Type | Description |
|---|---|---|
success |
boolean | Whether the request was successful |
message |
string | AI assistant's response |
history |
array | Updated conversation history including the new exchange |
reasoning |
object | AI reasoning process and confidence metrics |
performance |
object | Performance metrics (response time, tokens, model used) |
Status Codes:
200 OK- Request processed successfully400 Bad Request- Invalid request (missing message, empty message, too long, wrong type)500 Internal Server Error- Server error processing request503 Service Unavailable- Orchestrator not ready (still initializing)
Error Response:
{
"success": false,
"error": "Message is required",
"message": "Error processing your request. Please try again."
}
Context Mode Feature:
The context_mode parameter controls how user context is retrieved and used:
"fresh"(default): No user context is included. Each conversation starts fresh, ideal for:- General questions requiring no prior context
- Avoiding context contamination
- Faster responses (no context retrieval overhead)
"relevant": Only relevant user context is included based on relevance classification. The system:- Analyzes all previous interactions for the session
- Classifies which interactions are relevant to the current query
- Includes only relevant context summaries
- Ideal for:
- Follow-up questions that build on previous conversations
- Maintaining continuity within a research session
- Personalized responses based on user history
Example with Context Mode:
{
"message": "Can you remind me what we discussed about quantum computing?",
"session_id": "session-123",
"user_id": "user-456",
"context_mode": "relevant"
}
4. Initialize Orchestrator
Endpoint: POST /api/initialize
Description: Manually trigger orchestrator initialization (useful if initialization failed on startup).
Request:
POST /api/initialize HTTP/1.1
Host: huggingface.co
Content-Type: application/json
Request Body:
{}
Response (Success):
{
"success": true,
"message": "Orchestrator initialized successfully"
}
Response (Failure):
{
"success": false,
"message": "Initialization failed. Check logs for details."
}
Status Codes:
200 OK- Initialization successful500 Internal Server Error- Initialization failed
5. Get Context Mode
Endpoint: GET /api/context/mode
Description: Retrieve the current context retrieval mode for a session.
Request:
GET /api/context/mode?session_id=session-123 HTTP/1.1
Host: huggingface.co
Query Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
session_id |
string | ✅ Yes | Session identifier |
Response (Success):
{
"success": true,
"session_id": "session-123",
"context_mode": "fresh",
"description": {
"fresh": "No user context included - starts fresh each time",
"relevant": "Only relevant user context included based on relevance classification"
}
}
Response Fields:
| Field | Type | Description |
|---|---|---|
success |
boolean | Whether the request was successful |
session_id |
string | Session identifier |
context_mode |
string | Current mode: "fresh" or "relevant" |
description |
object | Description of each mode |
Status Codes:
200 OK- Success400 Bad Request- Missingsession_idparameter500 Internal Server Error- Server error503 Service Unavailable- Orchestrator not ready or context mode not available
Error Response:
{
"success": false,
"error": "session_id query parameter is required"
}
6. Set Context Mode
Endpoint: POST /api/context/mode
Description: Set the context retrieval mode for a session (fresh or relevant).
Request Headers:
Content-Type: application/json
Request Body:
{
"session_id": "session-123",
"mode": "relevant",
"user_id": "user-456"
}
Request Fields:
| Field | Type | Required | Description |
|---|---|---|---|
session_id |
string | ✅ Yes | Session identifier |
mode |
string | ✅ Yes | Context mode: "fresh" or "relevant" |
user_id |
string | ❌ No | User identifier (defaults to "anonymous") |
Response (Success):
{
"success": true,
"session_id": "session-123",
"context_mode": "relevant",
"message": "Context mode set successfully"
}
Response Fields:
| Field | Type | Description |
|---|---|---|
success |
boolean | Whether the request was successful |
session_id |
string | Session identifier |
context_mode |
string | The mode that was set |
message |
string | Success message |
Status Codes:
200 OK- Context mode set successfully400 Bad Request- Invalid request (missing fields, invalid mode)500 Internal Server Error- Server error or failed to set mode503 Service Unavailable- Orchestrator not ready or context mode not available
Error Response:
{
"success": false,
"error": "mode must be 'fresh' or 'relevant'"
}
Usage Notes:
- The context mode persists for the session until changed
- Setting
modeto"relevant"enables relevance classification, which analyzes all previous interactions to include only relevant context - Setting
modeto"fresh"disables context retrieval, providing faster responses without user history - The mode can also be set per-request via the
context_modeparameter in/api/chat
Code Examples
Python
import requests
import json
BASE_URL = "https://jatinautonomouslabs-research-ai-assistant-api.hf.space"
# Check health
def check_health():
response = requests.get(f"{BASE_URL}/api/health")
return response.json()
# Send chat message
def send_message(message, session_id=None, user_id=None, history=None, context_mode=None):
payload = {
"message": message,
"session_id": session_id,
"user_id": user_id or "anonymous",
"history": history or []
}
if context_mode:
payload["context_mode"] = context_mode
response = requests.post(
f"{BASE_URL}/api/chat",
json=payload,
headers={"Content-Type": "application/json"}
)
if response.status_code == 200:
return response.json()
else:
raise Exception(f"API Error: {response.status_code} - {response.text}")
# Example usage
if __name__ == "__main__":
# Check if API is ready
health = check_health()
print(f"API Status: {health}")
if health.get("orchestrator_ready"):
# Send a message
result = send_message(
message="What is machine learning?",
session_id="my-session-123",
user_id="user-456"
)
print(f"Response: {result['message']}")
print(f"Reasoning: {result.get('reasoning', {})}")
# Set context mode to relevant for follow-up
import requests
requests.post(
f"{BASE_URL}/api/context/mode",
json={
"session_id": "my-session-123",
"mode": "relevant",
"user_id": "user-456"
}
)
# Continue conversation with relevant context
history = result['history']
result2 = send_message(
message="Can you explain neural networks?",
session_id="my-session-123",
user_id="user-456",
history=history,
context_mode="relevant"
)
print(f"Follow-up Response: {result2['message']}")
JavaScript (Fetch API)
const BASE_URL = 'https://jatinautonomouslabs-research-ai-assistant-api.hf.space';
// Check health
async function checkHealth() {
const response = await fetch(`${BASE_URL}/api/health`);
return await response.json();
}
// Get context mode for a session
async function getContextMode(sessionId) {
const response = await fetch(`${BASE_URL}/api/context/mode?session_id=${sessionId}`);
if (!response.ok) {
throw new Error(`API Error: ${response.status}`);
}
return await response.json();
}
// Set context mode for a session
async function setContextMode(sessionId, mode, userId = null) {
const payload = {
session_id: sessionId,
mode: mode
};
if (userId) {
payload.user_id = userId;
}
const response = await fetch(`${BASE_URL}/api/context/mode`, {
method: 'POST',
headers: {
'Content-Type': 'application/json'
},
body: JSON.stringify(payload)
});
if (!response.ok) {
const error = await response.json();
throw new Error(`API Error: ${response.status} - ${error.error || error.message}`);
}
return await response.json();
}
// Send chat message
async function sendMessage(message, sessionId = null, userId = null, history = [], contextMode = null) {
const payload = {
message: message,
session_id: sessionId,
user_id: userId || 'anonymous',
history: history
};
if (contextMode) {
payload.context_mode = contextMode;
}
const response = await fetch(`${BASE_URL}/api/chat`, {
method: 'POST',
headers: {
'Content-Type': 'application/json'
},
body: JSON.stringify(payload)
});
if (!response.ok) {
const error = await response.json();
throw new Error(`API Error: ${response.status} - ${error.error || error.message}`);
}
return await response.json();
}
// Example usage
async function main() {
try {
// Check if API is ready
const health = await checkHealth();
console.log('API Status:', health);
if (health.orchestrator_ready) {
// Send a message
const result = await sendMessage(
'What is machine learning?',
'my-session-123',
'user-456'
);
console.log('Response:', result.message);
console.log('Reasoning:', result.reasoning);
// Continue conversation with relevant context
await setContextMode('my-session-123', 'relevant', 'user-456');
const result2 = await sendMessage(
'Can you explain neural networks?',
'my-session-123',
'user-456',
result.history,
'relevant'
);
console.log('Follow-up Response:', result2.message);
// Check current context mode
const modeInfo = await getContextMode('my-session-123');
console.log('Current context mode:', modeInfo.context_mode);
}
} catch (error) {
console.error('Error:', error);
}
}
main();
cURL
# Check health
curl -X GET "https://jatinautonomouslabs-research-ai-assistant-api.hf.space/api/health"
# Get context mode
curl -X GET "https://jatinautonomouslabs-research-ai-assistant-api.hf.space/api/context/mode?session_id=my-session-123"
# Set context mode to relevant
curl -X POST "https://jatinautonomouslabs-research-ai-assistant-api.hf.space/api/context/mode" \
-H "Content-Type: application/json" \
-d '{
"session_id": "my-session-123",
"mode": "relevant",
"user_id": "user-456"
}'
# Send chat message
curl -X POST "https://jatinautonomouslabs-research-ai-assistant-api.hf.space/api/chat" \
-H "Content-Type: application/json" \
-d '{
"message": "What is machine learning?",
"session_id": "my-session-123",
"user_id": "user-456",
"context_mode": "relevant",
"history": []
}'
# Continue conversation
curl -X POST "https://jatinautonomouslabs-research-ai-assistant-api.hf.space/api/chat" \
-H "Content-Type: application/json" \
-d '{
"message": "Can you explain neural networks?",
"session_id": "my-session-123",
"user_id": "user-456",
"history": [
["What is machine learning?", "Machine learning is a subset of artificial intelligence..."]
]
}'
Node.js (Axios)
const axios = require('axios');
const BASE_URL = 'https://jatinautonomouslabs-research-ai-assistant-api.hf.space';
// Check health
async function checkHealth() {
const response = await axios.get(`${BASE_URL}/api/health`);
return response.data;
}
// Get context mode
async function getContextMode(sessionId) {
const response = await axios.get(`${BASE_URL}/api/context/mode`, {
params: { session_id: sessionId }
});
return response.data;
}
// Set context mode
async function setContextMode(sessionId, mode, userId = null) {
const payload = {
session_id: sessionId,
mode: mode
};
if (userId) payload.user_id = userId;
const response = await axios.post(`${BASE_URL}/api/context/mode`, payload);
return response.data;
}
// Send chat message
async function sendMessage(message, sessionId = null, userId = null, history = [], contextMode = null) {
try {
const payload = {
message: message,
session_id: sessionId,
user_id: userId || 'anonymous',
history: history
};
if (contextMode) payload.context_mode = contextMode;
const response = await axios.post(`${BASE_URL}/api/chat`, payload, {
headers: {
'Content-Type': 'application/json'
}
});
return response.data;
} catch (error) {
if (error.response) {
throw new Error(`API Error: ${error.response.status} - ${error.response.data.error || error.response.data.message}`);
}
throw error;
}
}
// Example usage
(async () => {
try {
const health = await checkHealth();
console.log('API Status:', health);
if (health.orchestrator_ready) {
// Set context mode to relevant
await setContextMode('my-session-123', 'relevant', 'user-456');
const result = await sendMessage(
'What is machine learning?',
'my-session-123',
'user-456',
[],
'relevant'
);
console.log('Response:', result.message);
// Check current mode
const modeInfo = await getContextMode('my-session-123');
console.log('Context mode:', modeInfo.context_mode);
}
} catch (error) {
console.error('Error:', error.message);
}
})();
Error Handling
Common Error Responses
400 Bad Request
Missing Message:
{
"success": false,
"error": "Message is required"
}
Empty Message:
{
"success": false,
"error": "Message cannot be empty"
}
Message Too Long:
{
"success": false,
"error": "Message too long. Maximum length is 10000 characters"
}
Invalid Type:
{
"success": false,
"error": "Message must be a string"
}
503 Service Unavailable
Orchestrator Not Ready:
{
"success": false,
"error": "Orchestrator not ready",
"message": "AI system is initializing. Please try again in a moment."
}
Solution: Wait a few seconds and retry, or check the /api/health endpoint.
500 Internal Server Error
Generic Error:
{
"success": false,
"error": "Error message here",
"message": "Error processing your request. Please try again."
}
Best Practices
1. Session Management
- Use consistent session IDs for maintaining conversation context
- Generate unique session IDs per user conversation thread
- Include conversation history in subsequent requests for better context
# Good: Maintains context
session_id = "user-123-session-1"
history = []
# First message
result1 = send_message("What is AI?", session_id=session_id, history=history)
history = result1['history']
# Follow-up message (includes context)
result2 = send_message("Can you explain more?", session_id=session_id, history=history)
2. Error Handling
Always implement retry logic for 503 errors:
import time
def send_message_with_retry(message, max_retries=3, retry_delay=2):
for attempt in range(max_retries):
try:
result = send_message(message)
return result
except Exception as e:
if "503" in str(e) and attempt < max_retries - 1:
time.sleep(retry_delay)
continue
raise
3. Health Checks
Check API health before sending requests:
def is_api_ready():
try:
health = check_health()
return health.get("orchestrator_ready", False)
except:
return False
if is_api_ready():
# Send request
result = send_message("Hello")
else:
print("API is not ready yet")
4. Rate Limiting
- No explicit rate limits are currently enforced
- Recommended: Implement client-side rate limiting (e.g., 1 request per second)
- Consider: Implementing request queuing for high-volume applications
5. Message Length
- Maximum: 10,000 characters per message
- Recommended: Keep messages concise for faster processing
- For long content: Split into multiple messages or summarize
6. Context Management
- Include history in requests to maintain conversation context
- Session IDs help track conversations across multiple requests
- User IDs enable personalization and user-specific context
Integration Examples
React Component
import React, { useState, useEffect } from 'react';
const AIAssistant = () => {
const [message, setMessage] = useState('');
const [history, setHistory] = useState([]);
const [loading, setLoading] = useState(false);
const [sessionId] = useState(`session-${Date.now()}`);
const sendMessage = async () => {
if (!message.trim()) return;
setLoading(true);
try {
const response = await fetch('https://jatinautonomouslabs-research-ai-assistant-api.hf.space/api/chat', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
message: message,
session_id: sessionId,
user_id: 'user-123',
history: history
})
});
const data = await response.json();
if (data.success) {
setHistory(data.history);
setMessage('');
}
} catch (error) {
console.error('Error:', error);
} finally {
setLoading(false);
}
};
return (
<div>
<div className="chat-history">
{history.map(([user, assistant], idx) => (
<div key={idx}>
<div><strong>You:</strong> {user}</div>
<div><strong>Assistant:</strong> {assistant}</div>
</div>
))}
</div>
<input
value={message}
onChange={(e) => setMessage(e.target.value)}
onKeyPress={(e) => e.key === 'Enter' && sendMessage()}
disabled={loading}
/>
<button onClick={sendMessage} disabled={loading}>
{loading ? 'Sending...' : 'Send'}
</button>
</div>
);
};
Python CLI Tool
#!/usr/bin/env python3
import requests
import sys
BASE_URL = "https://jatinautonomouslabs-research-ai-assistant-api.hf.space"
class ChatCLI:
def __init__(self):
self.session_id = f"cli-session-{hash(__file__)}"
self.history = []
def chat(self, message):
response = requests.post(
f"{BASE_URL}/api/chat",
json={
"message": message,
"session_id": self.session_id,
"user_id": "cli-user",
"history": self.history
}
)
if response.status_code == 200:
data = response.json()
self.history = data['history']
return data['message']
else:
return f"Error: {response.status_code} - {response.text}"
def run(self):
print("AI Assistant CLI (Type 'exit' to quit)")
print("=" * 50)
while True:
user_input = input("\nYou: ").strip()
if user_input.lower() in ['exit', 'quit']:
break
print("Assistant: ", end="", flush=True)
response = self.chat(user_input)
print(response)
if __name__ == "__main__":
cli = ChatCLI()
cli.run()
Response Times
- Typical Response: 2-10 seconds
- First Request: May take longer due to model loading (10-30 seconds)
- Subsequent Requests: Faster due to cached models (2-5 seconds)
Factors Affecting Response Time:
- Message length
- Model loading (first request)
- GPU availability
- Concurrent requests
Troubleshooting
Common Issues
404 Not Found
Problem: Getting 404 when accessing the API
Solutions:
Verify the Space is running:
- Check the Hugging Face Space page to ensure it's built and running
- Wait for the initial build to complete (5-10 minutes)
Check URL format:
- ✅ Correct:
https://jatinautonomouslabs-research-ai-assistant-api.hf.space - ❌ Wrong:
https://jatinautonomouslabs-research_ai_assistant_api.hf.space(underscores) - ✅ Alternative:
https://huggingface.co/spaces/JatinAutonomousLabs/Research_AI_Assistant_API
- ✅ Correct:
Verify endpoint paths:
- Health:
GET /api/health - Chat:
POST /api/chat - Root:
GET /
- Health:
Test with root endpoint first:
curl https://jatinautonomouslabs-research-ai-assistant-api.hf.space/
503 Service Unavailable
Problem: Orchestrator not ready
Solutions:
- Wait 30-60 seconds for initialization
- Check
/api/healthendpoint - Use
/api/initializeto manually trigger initialization
CORS Errors
Problem: CORS errors in browser
Solutions:
- The API has CORS enabled for all origins
- If issues persist, check browser console for specific errors
- Ensure you're using the correct base URL
Testing API Connectivity
Quick Health Check:
# Test root endpoint
curl https://jatinautonomouslabs-research-ai-assistant-api.hf.space/
# Test health endpoint
curl https://jatinautonomouslabs-research-ai-assistant-api.hf.space/api/health
Python Test Script:
import requests
BASE_URL = "https://jatinautonomouslabs-research-ai-assistant-api.hf.space"
# Test root
try:
response = requests.get(f"{BASE_URL}/", timeout=10)
print(f"Root endpoint: {response.status_code} - {response.json()}")
except Exception as e:
print(f"Root endpoint failed: {e}")
# Test health
try:
response = requests.get(f"{BASE_URL}/api/health", timeout=10)
print(f"Health endpoint: {response.status_code} - {response.json()}")
except Exception as e:
print(f"Health endpoint failed: {e}")
Support
For issues, questions, or contributions:
- Repository: [GitHub Repository URL]
- Hugging Face Space: https://huggingface.co/spaces/JatinAutonomousLabs/Research_AI_Assistant_API
Changelog
Version 1.0 (Current)
- Initial API release
- Chat endpoint with context management
- Health check endpoint
- Local GPU model inference
- CORS enabled for web integration
License
This API is provided as-is. Please refer to the main project README for license information.