Spaces:
Sleeping
Sleeping
File size: 2,227 Bytes
d67728f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 |
# π§ ToGMAL Prompt Difficulty Analyzer
Real-time LLM capability boundary detection using vector similarity search.
## π― What This Does
This system analyzes any prompt and tells you:
1. **How difficult it is** for current LLMs (based on real benchmark data)
2. **Why it's difficult** (shows similar benchmark questions)
3. **What to do about it** (actionable recommendations)
## π₯ Key Innovation
Instead of clustering by domain (all math together), we cluster by **difficulty** - what's actually hard for LLMs regardless of domain.
## π Real Data
- **14,042 MMLU questions** with real success rates from top models
- **<50ms query time** for real-time analysis
- **Production ready** vector database
## π Demo Links
- **Local**: http://127.0.0.1:7860
- **Public**: https://99b38fc2e31da2f83d.gradio.live
## π§ͺ Example Results
### Hard Questions (Low Success Rates)
```
Prompt: "Statement 1 | Every field is also a ring..."
Risk: HIGH (23.9% success)
Recommendation: Multi-step reasoning with verification
Prompt: "Find all zeros of polynomial xΒ³ + 2x + 2 in Zβ"
Risk: MODERATE (43.8% success)
Recommendation: Use chain-of-thought prompting
```
### Easy Questions (High Success Rates)
```
Prompt: "What is 2 + 2?"
Risk: MINIMAL (100% success)
Recommendation: Standard LLM response adequate
Prompt: "What is the capital of France?"
Risk: MINIMAL (100% success)
Recommendation: Standard LLM response adequate
```
## π οΈ Technical Details
### Architecture
```
User Prompt β Embedding Model β Vector DB β K Nearest Questions β Weighted Score
```
### Components
1. **Sentence Transformers** (all-MiniLM-L6-v2) for embeddings
2. **ChromaDB** for vector storage
3. **Real MMLU data** with success rates from top models
4. **Gradio** for web interface
## π Next Steps
1. Add more benchmark datasets (GPQA, MATH)
2. Fetch real per-question results from multiple top models
3. Integrate with ToGMAL MCP server for Claude Desktop
4. Deploy to HuggingFace Spaces for permanent hosting
## π Quick Start
```bash
# Install dependencies
uv pip install -r requirements.txt
uv pip install gradio
# Run the demo
python demo_app.py
```
Visit http://127.0.0.1:7860 to use the web interface. |